Model selection for partial least squares regression
نویسندگان
چکیده
Partial least squares (PLS) regression is a powerful and frequently applied technique in multivariate statistical process control when the process variables are highly correlated. Selection of the number of latent variables to build a representative model is an important issue. A metric frequently used by chemometricians for the determination of the number of latent variables is that of Wold’s R criterion, whilst more recently a number of statisticians have advocated the use of Akaike Information Criterion (AIC). In this paper, a comparison between Wold’s R criterion and AIC for the selection of the number of latent variables to include in a PLS model that will form the basis of a multivariate statistical process control representation is undertaken based on a simulation study. It is shown that neither Wold’s R criterion nor AIC exhibit satisfactory performance. This is in contrast to the adjusted Wold’s R criteria which is shown to demonstrate satisfactory performance in terms of the number of times the known true model is selected. Two industrial applications are then used to demonstrate the methodology. The first relates to the modelling of a product quality using data from an industrial fluidised bed reactor and the second focuses on an industrial NIR data set. The results are consistent with those of the simulation studies. D 2002 Elsevier Science B.V. All rights reserved.
منابع مشابه
Application of Genetic Algorithms for Pixel Selection in MIA-QSAR Studies on Anti-HIV HEPT Analogues for New Design Derivatives
Quantitative structure-activity relationship (QSAR) analysis has been carried out with a series of 107 anti-HIV HEPT compounds with antiviral activity, which was performed by chemometrics methods. Bi-dimensional images were used to calculate some pixels and multivariate image analysis was applied to QSAR modelling of the anti-HIV potential of HEPT analogues by means of multivariate calibration,...
متن کاملPixel selection by successive projections algorithm method in multivariate image analysis for a QSAR study of antimicrobial activity for cephalosporins and design new cephalosporins
Thirty-one Cephalosporin compounds were modeled using the multivariate image analysis and applied to the quantitative structure activity relationship (MIA-QSAR) approach. The acid dissociation constants (pKa) of cephalosporins play a fundamental role in the mechanism of activity of cephalosporins. The antimicrobial activity of cephalosporins was related to their first pKa by different models. B...
متن کاملApplication of Genetic Algorithms for Pixel Selection in MIA-QSAR Studies on Anti-HIV HEPT Analogues for New Design Derivatives
Quantitative structure-activity relationship (QSAR) analysis has been carried out with a series of 107 anti-HIV HEPT compounds with antiviral activity, which was performed by chemometrics methods. Bi-dimensional images were used to calculate some pixels and multivariate image analysis was applied to QSAR modelling of the anti-HIV potential of HEPT analogues by means of multivariate calibration,...
متن کاملPixel selection by successive projections algorithm method in multivariate image analysis for a QSAR study of antimicrobial activity for cephalosporins and design new cephalosporins
Thirty-one Cephalosporin compounds were modeled using the multivariate image analysis and applied to the quantitative structure activity relationship (MIA-QSAR) approach. The acid dissociation constants (pKa) of cephalosporins play a fundamental role in the mechanism of activity of cephalosporins. The antimicrobial activity of cephalosporins was related to their first pKa by different models. B...
متن کاملPrediction of selectivity index of pentachlorophenol-imprinted polymers
A data set comprising of the selectivity index of pentachlorophenol-imprinted polymers against 53 pentachlorophenol and related compounds was obtained from the excellent work of Baggiani et al. Molecular descriptors of the phenol compounds were calculated with EDRAGON to obtain a total of 1,666 descriptors spanning 20 categories of molecular properties. Multivariate analysis of the data set was...
متن کاملPredictive model selection in partial least squares path modeling
Predictive model selection metrics are used to select models with the highest out-of-sample predictive power among a set of models. R 2 and related metrics, which are heavily used in partial least squares path modeling, are often mistaken as predictive metrics. We introduce information theoretic model selection criteria that are designed for out-of-sample prediction and which do not require cre...
متن کامل